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Abstract 

The major advantage of mRNA vaccines over more conventional approaches is their 
potential for rapid development and large-scale deployment in pandemic situations. In 
the current COVID-19 crisis the two mRNA COVID-19 vaccines have been conditionally 
approved and broadly applied, while others are still in clinical trials. However, there is 
no previous experience with the use of mRNA vaccines on the large scale in general 
population. This warrants a careful evaluation of mRNA vaccine safety properties by 
considering all available knowledge on the mRNA molecular biology and evolution. Here, 
I discuss the pervasive claim that mRNA-based vaccines cannot alter genomes. 
Surprisingly, this notion is widely stated in the mRNA vaccine literature, but never 
supported by referencing any primary scientific papers that would specifically address 
this question. This discrepancy becomes even more puzzling if one considers previous 
work on the molecular and evolutionary aspects of retroposition in murine and human 
populations that clearly documents the frequent integration of mRNA molecules into 
genomes, including clinical contexts. By performing basic comparisons, I showed that the 
sequence features of mRNA vaccines meet all known requirements for retroposition by 
L1 elements — the most abundant autonomously active retrotransposons in the human 
genome. In contrast, I found an evolutionary bias in the set of known retrocopy 
generating genes — a pattern that might help in the future development of retroposition- 
resistant therapeutic mRNAs. I conclude that is unfounded to a priori assume that 
mRNA-based therapeutics do not impact genomes, and that the route to genome 
integration of vaccine mRNAs via endogenous L1 retroelements is easily conceivable. 


This implies that we urgently need experimental studies that would rigorously test for the 
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potential retroposition of vaccine mRNAs. At present, the insertional mutagenesis safety 


of mRNA-based vaccines should be considered unresolved. 


Introduction 

The research and development of mRNA-based therapeutics gained momentum with the onset 
of the COVID-19 pandemics. Currently the two mRNA vaccines against SARS-CoV-2 
(BioNTech/Pfizer BNT162b2 and Moderna mRNA-1273) have been approved for use in 
general population in many countries (e.g. 1,2), and several others are under development (3— 
5). It has often been suggested that the main advantage of mRNA-based vaccines, compared to 
the more conventional approaches, is the possibility of their rapid development and large-scale 
deployment (6,7), which are both desirable properties in pandemic situations. The statement 
that vaccine mRNAs do not pose the risk for genome integration (e.g. 6,8—-12), and 
consequently that there is no insertional mutagenesis risk, is another commonly listed 
advantage of mRNA-based vaccines, especially when contrasted to the safety profile of DNA- 
based therapeutics (10,12,13). This claim prompted me to look more carefully into the mRNA 
vaccine literature to find a rationale for it. Surprisingly, I was not able to track down any 
experimental or theoretical study that specifically addresses the possibility of genome 


integration of mRNA therapeutics. 


This shortage of relevant studies is reflected in numerous reviews (4—6,9,10,14—18), book 
chapters on the mRNA vaccines (13,19—22) and documents of international organizations (23— 
25) which often state that mRNA vaccines do not pose the risk for genome integration, but 
miss to cite any references in support of this idea. Occasionally, some citations are embedded 
(e.g. 15,22,26,27), but unfortunately, they are circular as they point to the similar unsupported 
statements (6,10,21,28—30). This signals that the idea of vaccine mRNAs resistance to genome 
integration behaves like a meme that self-replicates in the literature, and therefore it should not 
be considered reliable scientific information. Undoubtedly, there is always a possibility that 
my literature search missed some important work, however other researchers also notice, 
although without going into details, the shortage of studies that explicitly deal with the 
possibility of vaccine mRNA genome integration (13,31—34). 


Besides the lack of references, the argumentation line for the claim that the genome integration 
of vaccine mRNA molecules is not possible, or is negligible, is rather limited in the vast 


majority of papers. Many of them simply state that vaccine mRNA cannot integrate into the 
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host genome without explaining why this is not possible (3,10,12,19—22,26,30). Others shortly 
describe that vaccine mRNAs remain in the cytoplasm of the host cells — in contrast to DNA- 
based vaccines that must enter the nucleus to be effective — and thus do not have the 


opportunity to change the genome (4,9,18,27,35). 


Recently, some papers argue that the relatively short persistence of mRNA makes genome 
integration of mRNA vaccines improbable (4,13,27). However, some of them also recognize 
the possibility of genome integration if vaccine mRNA is reverse-transcribed in the host cells 
(4,13,31). As a possible source of enzymes for reverse transcription and genome integration 
human endogenous retroviruses (HERVs) and retroviral infections (e.g. HIV) are mentioned, 
with conclusion that the integration risk is still highly unlikely (4,31). In contrast, some authors 
are more cautious and suggest that investigation may be needed to clarify whether vaccine 


mRNA integration can occur (13). 


The biology of retroposition 

Nevertheless, this discussion within the vaccinology field on the vaccine mRNA genome 
integration risks is rather brief and surprisingly incomplete as it does not consider the 
accumulated knowledge on the biology of retroposition (36—40). In many eukaryotes the 
cellular mRNAs of various genes are endogenously reverse-transcribed and reintegrated into 
the genome yielding their retrocopies (Fig. 1b) (36,3840). This process of mRNA-mediated 
gene duplication is highly frequent in therian mammals (41), and is best studied in primates 
and mice (36—38,40). Of note, the term retrocopy is often interchanged with other related terms 
like processed pseudogenes, retrotransposed pseudogenes, retropseudogenes, retroposed gene 
copies, retroCNVs, and retrogenes, as the terminology related to retroposition is not yet fully 


settled (38,39). 
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Figure 1. L1-mediated retroposition. A) Retroposition cycle of L1 elements. An active L1 
element is transcribed in the nucleus and resulting LI mRNA is transported to the cytoplasm 
where it undergoes translation (42,43). LI mRNA codes for ORF1 and ORF2 proteins which 
preferentially associate with L1 mRNA (cis-preference) to form L1 ribonucleoprotein particle 
(L1 RNP) (42—44). ORF 1p is an RNA binding protein with chaperone activity, while ORF2p 
functions as reverse transcriptase and endonuclease (45,46). By a yet unresolved mechanism 
L1 RNP, which contains at least LI mRNA and ORF2p, enters the nucleus. In the nucleus, L1 
mRNA is reverse transcribed and integrated into the genome by the process of target-primed 
reverse transcription (TPRT) (43,45-47). The retroposition mechanism relies on the binding of 
ORF2p to the L1 mRNA poly-A tail (46,48-50). There is some evidence that the cells could 
uptake extracellular vesicles (EVs) containing L1 mRNA which can than undergo translation 
and retroposition (51). B) L1l-mediated retroposition of protein coding genes. A parental 
protein coding gene is transcribed in the nucleus. The resulting pre-mRNA is processed and 


mature parental gene mRNA is then transported to the cytoplasm. L1 proteins (ORF1p and 
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ORF2p) interact with parental gene mRNA by the process termed trans-association to form 
parental gene ribonucleoprotein particle (parental gene RNP) (36,43,44,47). Similar to L1 
RNP, parental gene RNP enters the nucleus where by the TPRT process parental gene mRNA 
is reverse transcribed and integrated into the genome. The poly-A tail of parental gene mRNA 
plays the crucial role in this process (36,48—50). C) Hypothetical L1-mediated retroposition of 
vaccine mRNA. Vaccine mRNA formulated in lipid nanoparticles (LNPs) enter the cell by 
endocytosis (1,2,6,10,52). A fraction of vaccine mRNA enters the cytosol via endosomal 
escape, the rest of vaccine mRNA undergoes degradation in endosomes (52), or is repackaged 
in multivesicular endosomes into extracellular vesicles (EVs) and secreted back into the 
extracellular space (53). The neighboring or distant cells can uptake vaccine mRNA from these 
EVs (53,54). L1 proteins (ORF1p and ORF2p) interact with vaccine mRNA by the process 
termed ¢rans-association to form vaccine mRNA ribonucleoprotein particle (vaccine mRNA 
RNP) (36,43,44,47). Like L1 and parental gene RNPs, vaccine mRNA RNP enters the nucleus 
where by the TPRT process vaccine mRNA is reverse transcribed and integrated into the 


genome. The poly-A tail of vaccine mRNA plays the crucial role in this process (36,48—50). 


Depending on the annotation methodology, the estimated number of retrocopies in the human 
genome vary, but the figures in most studies revolve around 8,000 (38,39,55,56), and these 
retrocopies are derived from around 2,500 parental genes (55,57) — i.e. genes whose mRNAs 
are reverse transcribed and integrated into genome (Fig. la,b). These values are similarly high 
in all screened therian mammals and reflect endogenous retroposition activity during ~200 My 
of their evolution (41,57). However, the continuous activity of retroposition is also apparent in 
extant human populations where substantial polymorphism of novel retrocopies is revealed 
(37,56,58—60). For instance, it was estimated that an individual harbors in average six novel 
retrocopies which are absent from the human reference genome, and that these retrocopies were 
derived from the pool of 503 unique parental genes (37). These values indicate a rather high 


retroposition activity in present human populations. 


A recent study in mice suggests that the actual rate of retrocopy generation in extant 
populations is even higher and possibly similar between humans and mice (40), and hence it is 
not surprising that retrocopy variation is detected in medical contexts (61,62). However, it is 
also suggested that due to the use of unoptimized analytical pipelines many retrocopies have 
often been overlooked in the routine genetic testings (40,61). At present, there are several 


documented cases of retrocopy emergence related to diseases in animals (47,61,63), and one 
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case of pathogenic retrocopy in humans (47,61,64,65), but more could be expected to be 
discovered (40). Actually, it seems that retrocopy variation in human populations might be 
more phenotypically relevant and population-specific than single nucleotide polymorphisms 
(37,40), and that the most of newly transposed retrocopies have a deleterious impact (40). All 
of this suggests that the mutation load coming from the retroposition activity in extant human 


populations is medically relevant. 


Regardless of the initial selective purge (40), retrocopies are the source of novel genes with 
adaptive significance that contribute to human biology and health (36,39). Previously, 
retrocopies have been viewed as the unfunctional remnants of evolutionary turnover, termed 
processed pseudogenes (39), mainly because it was presumed that retrocopies inherently lack 
transcription-driving elements and thus could not be transcribed (39-41). A similar argument 
is recently raised in the vaccinology field when the possibility of vaccine mRNA genome 
integration and its impact on phenotypes is discussed (13). However, after it was realized that 
the most regions of a mammalian genome are transcribed (66—68), and that retrocopies could 
easily gain their own regulatory elements (36,38,40,41), it has become apparent that most 


retrocopies show evidence of transcription (38,40,41). 


These transcribed retrocopies are thus the source of evolutionary innovations as they could be 
further transformed to novel protein coding or RNA retrogenes (36,38,41,69). Approximately 
several hundred RNA and several hundred protein coding retrogenes are estimated to be active 
in humans and mice (36,38). For most of them functional significance has yet to be determined, 
but some are known to be human disease genes (70,71) or to have discernible phenotypes 


(36,38). 


Many of the retrocopies I have discussed so far are vertically transmitted through the germline, 
but mRNA retroposition also occurs in somatic tissues. Somatic retroposition is substantially 
less studied, but it is known to be common in cancer tissues (58,72—75), and to occur during 
early development (64,65). However, the activity of endogenous retroelements that drive 
retroduplication in humans suggests that mRNA retroposition events should be found in other 
somatic tissues as well (see below). This indicates that retrocopies continuously reshape the 
human genome, not only at the population level and deeper evolutionary time scale, but also 


in somatic tissues during individual development. It is therefore important to consider the 
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endogenous mechanisms of retroposition in humans when the genomic integration probability 


of mRNA vaccines is evaluated. 


The mechanisms of retrocopy formation 

The mechanism that leads to the formation of retrocopies in human lineage is relatively well 
studied and predominantly includes long interspersed element-1 (Fig. la) (LINE-1 or L1) 
retrotransposons (36,38,40,44,76), albeit there is some evidence that retroposition through long 
terminal repeat (LTR) retrotransposons is also possible (38,76). L1 retroelements are around 6 
kb long, make 17 percent of the human genome and around one hundred of them are active in 
spreading their copies in the genome by means of retroposition of their own mRNA (Fig. 1a) 
(42,43,47,77—80). When transcribed L1 produces bicistronic mRNA that codes for two 
proteins; ORF1Ip is an RNA binding protein with chaperone activity, while ORF2p functions 
as reverse transcriptase and endonuclease (42,43,45—47,79,80). Together with a LI mRNA 
these proteins assemble in the cytoplasm into a L1 ribonucleoprotein particle (L1 RNP), which 


can then enter the nucleus (Fig. 1a) (42,43,45-47,79,80). 


In the nucleus, LI mRNA is eventually reverse transcribed and integrated into the genome at 
A/T rich consensus target sites by the process termed target-primed reverse transcription 
(TPRT) (Fig. la) (43,45-47). In the antisense direction L1 also codes for ORFOp, a small 
peptide that localizes in the nucleus and enhances efficiency of retrotransposition (47,81). 
During the L1 lifecycle diverse host proteins interact with L1 RNPs by promoting or 
suppressing their retrotransposition (47,82). L1 protein machinery preferentially targets their 
encoding mRNA (cis-preference), but it can also mobilize a variety of other RNAs present in 
the cell (trans-association) including non-autonomous mobile elements (Alu, SVA), 


splicesomal RNAs and diverse protein coding mRNAs (Fig. 1b) (43,44,47,78,83). 


This relaxed retroposition behavior of L1 elements, which allows mobilization of various 
mRNAs through trans-association, is responsible for the massive accumulation of non- 
autonomous mobile elements and retrocopies in genomes (Fig. 1b). The question arises how 
L1 elements achieve such promiscuous performance. The underlying reason for such behavior 
is linked to the L1 retroposition mechanism that is contingent on ORF2p binding to the poly- 
A tail during RNP formation in the cytoplasm (Fig. 1) (48,49). Subsequently in the nucleus, 
genome integration also relies on the poly-A tail which permits flexibility in DNA priming at 


the target site during the TPRT process (46,50). Given that poly-A tails are unspecific low 
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complexity sequences that are almost ubiquitously present at the 3' ends of cellular mRNAs 
(84), this implies that in principle every mRNA could be a target of L1 protein machinery and 
undergo the TPRT process (Fig. Ic). 


However, the complete lack of retroposition specificity would significantly lower the fitness 
of L1 elements and compromise their parasitic proliferation in the genomes. To avoid this 
scenario L1 elements managed to preferentially target their own mRNA regardless of the poly- 
A tail dependence (44,85,86). A popular model that tries to explain the mechanisms of this cis- 
preference envisage that during translation emerging L1 proteins associate immediately at the 
ribosome to their encoding mRNA (42,45,48,87). Obviously, this or a similar process ensures 
the balance between parasitic reproduction of L1 elements and the occasional mobilization of 


diverse mRNAs by trans-association via poly-A tracts (Fig. 1). 


L1 elements in germline and soma 

The overall dynamics of L1 retroelements makes them important contributors to genetic 
variation within and between individuals with implications on the evolution and disease in 
humans (43,80,88). Interaction between the host genome and L1 elements is multilayered with 
beneficial and detrimental effects on the host fitness (88—93). For this reason, the host cells 
evolved various mechanisms to keep in balance their activity (88,91,94—99). Regardless of 
these host protection mechanisms, a new retroposition event mediated by L1 elements must 


occur in the germline to be passed to the next generation (92). 


The mere presence of numerous vertically inherited L1 elements, non-autonomous mobile 
elements and retrocopies in human genomes provides a direct evidence that their mobilization 
repeatedly occurs in the germline (94). It has also been well established that L1 activity 
contributes to the ongoing germline mutagenesis (100,101). However, the precise dynamics of 
retroposition during the germline lifecycle is less clear (91,92,102,103). The current data 
suggest that L1 elements show expression and retroposition activity in testes (91,100,101,104), 
spermatozoa (105,106), ovaries (100,101), oocytes (107), and early embryos 
(92,94,100, 102,103,108). 


Although it was initially thought that L1 elements are mainly active in the germline, 
accumulated evidence suggests that they also should be considered an endogenous mutagen in 


somatic tissues (94,95,101,109). L1 elements are expressed in diverse human somatic tissues 
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including liver, spleen, adrenal glands, lungs, heart and brain (101), lymphoblastoid cell lines 
(110), platelets, megakaryocytes and T cells (93). Expression and retroposition activity of L1 
elements was detected in vascular endothelial cells as well (104,111). However, somatic L1 
retroposition have been extensively studied only in the brain, cancer tissues and the 


gastrointestinal tract (43,73). 


During both embryonic and adult neurogenesis L1 retroposition activity generates significant 
neuronal mosaicism (56,94,112—-116) that further increases in neurological disorders 
(116,117). L1 retroposition occurs in diverse cell types of the central nervous system including 
glial cells, neuronal progenitor cells, differentiating neurons and mature non-dividing neurons 
(113,116,118-121). It is speculated that L1-driven somatic mosaicism may alter functional 
properties of neural cells and that many of them may contain a unique genome (113,121). 


However, biological and medical significance of this mosaicism is not fully clear (115—117). 


L1 elements are also highly expressed in many human cancers, where they function as an 
endogenous mutagen, and can be responsible for driving mutations in tumorigenesis (79,80). 
Epithelial cancers seem to be particularly prone to L1 retroposition (43,73). Interestingly, L1 
insertions are found in tumor cells as well as normal cells of liver, stomach, colon and 
esophagus (122-125), suggesting widespread somatic activity of L1 elements in the 
gastrointestinal tract. In general, somatic L1 retroposition is highly ontogeny dependent and 
strongly increases with advanced age due to L1 transcriptional derepression (99,126). In 
addition to endogenous regulation, the activity of L1 elements is sensitive to exogenous signals 
and could be induced by numerous environmental factors (88,94,95, 109,117). Taken together, 
it is clear that human germinative and many somatic cells have lasting potential for L1- 


mediated retroposition by cis-preference and trans-association (Fig. 1). 


Vaccine mRNAs and retroposition 

Evidently, various mRNAs in humans could be reverse transcribed and integrated into genome 
via L1 retroelements with negative effects on fitness. However, this does not readily imply that 
this will occur to vaccine mRNAs. A definitive answer will come from experiments and 
population monitoring, but for now it is helpful to consider their described properties and 
evaluate them against the L1 retroposition mechanism (Fig. 1). The active substance of 
BNT162b2 vaccine is a 4,284-nucleotide long synthetic mRNA molecule that contains N1- 


methylpseudouridine (m1'P), a modified nucleoside that substitutes naturally occurring uridine 
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(1,127,128). This nucleoside modification reduces innate immune response to exogenous 
mRNA molecules and enhances their translation (6,129—131). Structurally BNT162b2 mRNA 
consists of a 5’ cap analogue, a 5' untranslated region, a codon-optimized SARS-CoV-2 spike 
protein coding sequence, a 3' untranslated region and a 110-nucleotide poly-A tail 
(1,52,127,128). These structural elements follow the usual eukaryotic mRNA architecture and 
help to increase RNA stability and translational efficiency of mRNA vaccines (6,10,28,128). 
In contrast to BNT162b2, the exact mRNA sequence of mRNA-1273 vaccine seems not to be 
publicly disclosed (52). However, its general design is similar to BNT162b2 mRNA including 
the use of m1¥ instead of uridine, the presence of a 5' cap structure, a 5' untranslated region, a 
codon-optimized spike protein coding sequence, a 3' untranslated region, and a poly-A tail 


(2,132). 


From the perspective of their sequence arrangement BNT162b2 and mRNA-1273 mRNA 
synthetic molecules appear to be suitable targets for L1 retroposition in trans because they 
structurally and functionally mimic the architecture of native mRNAs that occur in the 
cytoplasm of eukaryotic cells (6,10). In this regard, probably the most important sequence 
feature is their poly-A tail that is known to be required for L1-mediated retroposition (Fig. 1) 
(49). However, the available information on the vaccine mRNA engineering logic reveals that 
vaccine mRNAs were not specifically constructed to avoid capture by the L1 retroposition 
machinery (1,2,6,10,52). In fact, it seems that no study in the mRNA vaccine field considered 
this possibility (e.g. 4,6,10,13,31). For instance, the poly-A tail of BNT162b2 mRNA contains 
a 10 nucleotides long linker sequence that is flanked by 30 and 70 nucleotides long adenosine 
tracts (127). Nevertheless, this poly-A tail modification, which helps in increasing translational 
efficiency (128,133), is unlikely to affect the retroposition propensity of the vaccine mRNA 
because only nucleotide changes directly neighboring the 3' end of the poly-A tail are known 
to have significant impact on the L1 retroposition mechanism (49,50,97). Moreover, non- 
adenosine nucleotides at the 3' end of the poly-A tail are generally avoided in mRNA 
therapeutics as they hamper translational efficiency (134). Similarly, the m1¥ ribonucleoside 
modification, because of the total number of modified nucleotides per mRNA molecule, is 
perhaps the most striking artificial feature of the vaccine mRNAs — however, these types of 


ribonucleoside modifications generally do not prevent reverse transcription (135). 


Parental genes and BNT162b2 
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In the comparative context, genes known to actively generate retrocopies (parental genes) in 
extant populations (Fig. 1b) are the best reference to assess general mRNA sequence trends 
related to retroposition. However, the collective properties of parental genes have not been 
extensively analyzed. Some studies report that parental genes are enriched in translation, 
ribosome, intracellular lumen and cell division related functional categories (37,58,60), and 
that they have a weak tendency to be highly expressed (37), but a more detailed analysis is still 
missing. It is helpful then to explore here some basic sequence properties of mRNAs 
transcribed from parental genes known to actively generate retrocopies in extant populations 
(37,40), and then to relate this information to the vaccine mRNA sequence that is publicly 
available (i.e. BNT162b2). 


The current estimate of 503 parental genes in humans (37) is lower than in mice where 1663 
of them are recovered (40). However, the study in mice which use an improved retrocopy 
detection pipeline and higher sequencing depths, finds that the number of parental genes has 
not reached saturation, thus the actual number of parental genes should be expected to be 
higher, especially in humans (40). Regardless of this inherent incompleteness, the available 
datasets showed that both mouse and human parental genes have a broad distribution of mRNA 
lengths (Fig. 2a, b). It is also evident that the mRNAs of parental genes tend to have slightly 
longer sequences than the average for all protein coding genes (Fig. 2a, b). Under the caveat 
that I here considered only the longest splicing variant per gene, and that shorter and intronless 
genes might be overlooked in the retrocopy/parental gene detection pipelines, this result 
revealed that L1-mediated retroposition in trans is modulated to some extent by parental gene 
mRNA sequence length. In any case, the sequence length of BNT162b2 mRNA falls very close 
to the average mRNA length of parental genes (Fig. 2a, b), indicating that the sequence length 
of BNT162b2 mRNA will likely not be an obstacle to retroposition. 
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Figure 2. The basic sequence properties of BNT162b2 mRNA are within the range of 
parental genes that generate retrocopies. The jitter plots show parental genes (blue dots) and 
all genes (gray dots) randomly distributed along x-axis. The red triangle shows BNT162b2 
mRNA values. The significance of difference between parental genes average (blue dashed 
line) and all genes average (gray solid line) are tested by permutation test (two-tailed, 10° 
permutations). The initial lists contained 503 human (37) and 1,663 mouse parental gene names 


(40). All mouse and 496 human parental gene names were successfully linked to the sequence 


data. Poly-A tail lengths were obtained for 7,760 (organoids, replicate 1) and 9,132 (iPSCs, 
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replicate 1) human genes by averaging multiple estimates per gene (84). A) The comparison of 
cDNA lengths in mice (p = 0; 22,770 all genes, 1,663 parental genes, Ensembl GRCm38.86). 
B) The comparison of cDNA lengths in humans (p = 0; 22,964 all genes, 496 parental genes, 
Ensemble GRCh38.86) C) The comparison of GC content in mice (p = 0.00021; 22,770 all 
genes, 1,663 parental genes, Ensembl GRCm38.86) D) The comparison of GC contents in 
humans (p = 0, 22,964 all genes, 498 parental genes, Ensemble GRCh38.86) E) The 
comparison of poly-A tail lengths in human iPSCs-derived cerebral organoids (p = 0.69; 7,760 
all genes, 330 parental genes, Ensemble GRCh38.84) F) The comparison of poly-A tail lengths 
in human induced pluripotent stem cells (iPSCs) (p = 0.26; 9,132 all genes, 369 parental genes, 
Ensemble GRCh38.84) 


To improve their translation and stability, vaccine mRNAs are frequently sequence and/or 
codon optimized (1,6,52,136) and this optimization could affect GC content. Hence, to see if 
the GC content of BNT162b2 mRNA is outside the range of parental genes I explored their 
GC content in mice and humans. Similar to the mRNA length analysis, GC content of parental 
genes shows a broad range of values (Fig. 2c, d). In mice, average GC content of parental genes 
is almost equal to the genome average (Fig. 2c), whereas in humans parental genes tend to have 
slightly lower average GC content (Fig. 2d). Although the GC content of BNT162b2 mRNA 
is higher than the average of parental genes, it is well within their range (Fig. 2c, d), thus it is 


unlikely that peculiarities of BNT162b2 GC content will prevent its retroposition. 


The mRNA sequences analyzed so far correspond to bioinformatic cDNA sequences; i.e. 
coding sequence plus untranslated regions excluding poly-A tail. Commonly, poly-A tails are 
not considered in genome-based analyses because they are post-transcriptionally added, and it 
was technically challenging to recover precisely their nucleotide sequence. However, poly-A 
tails sequencing approaches at the transcriptome scale are continuously improving and recently 
produced datasets provide an opportunity to get insight into the distribution of their lengths 
(84). Here I explored poly-A tail lengths estimated using FLAM-seq in human induced 
pluripotent stem cells (1PSCs) and iPSCs-derived cerebral organoids (84). I found no difference 
between average poly-A tail lengths of known parental genes and all coding genes (Fig. le, f). 
The distribution range of parental gene poly-A tail lengths is rather broad (Fig. le, f), indicating 
that L1 machinery is mostly insensitive to the variation in poly-A tail lengths. The BNT162b2 
poly-A tail with 110 nucleotides is well within the range of these values, so no specific 


difficulties in retroposition regarding the poly-A tail length are expected. At this point, it is 
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worth mentioning that poly-A tail is present in other mRNA vaccine candidates as well 


(5,137,138). 


Parental genes show evolutionary bias 

This simple ad hoc comparative analysis that covers the length, GC content and poly-A tail 
length of parental genes that actively produce retrocopies in extant populations (Fig. 2) could 
be expanded by considering other datasets and sequence traits, or by using more sophisticated 
analytical approaches. However, its main purpose is to show that effectively any poly-A tail 
containing mRNA in human cells, including vaccine mRNAs, has some chance to be integrated 
into the genome by L1 machinery. I hope, this should incite experimental studies that will 
establish with certainty if some particular mRNA species is retroposition-proof and uncover 
mechanistic reasons for such behavior (139). On the other hand, we and others previously 
showed that the computational macroevolutionary analyses of gene sets linked to disease and 
other phenotypes could bring unexpected insights (140-144) with predictive power that could 
guide experiments (145—148). This approach could also be applied on the currently available 
sets of parental genes that actively produce retrocopies, however it appears that this has not 
been done so far (37,40). To fill this void, at least in part, I made here a pilot macroevolutionary 


analysis. 


In order to see if the sets of parental genes that actively generate retrocopies in human and 
mouse (37,40) have some evolutionary bias, I analyzed the phylogenetic origin of their protein 
sequences using the phylostratigraphic approach (Fig 3). The enrichment profiles on the 
phylostratigraphic maps show that although protein sequence of parental genes could be traced 
back to a wide range of phylogenetic levels (phylostrata - ps) they tend to be evolutionary old 
(Fig 3). I found significant enrichments among genes that are common to all cellular life (ps1, 
Fig 3), genes that originated in archaea (ps2, Fig3), and among those that emerged at the origin 
of eukaryotes (ps4, Fig 3). This result suggests that evolutionary ancient genes, for yet 
unknown reason, tend to have higher retroposition rates in present populations. In addition, this 
reveals that there is some predictability in the patterns of endogenous mRNA retroposition. In 
future work this bias could be used as a starting point in search of underlying factors that 
correlate with the gene age and directly promote or limit mRNA retroposition in mice and 
humans. Transcription levels, cellular localizations, translation rates, various sequence 
features, and mRNA regulation and stability are some of the possible factors that could be 


contrasted between ancient phylostrata enriched with parental genes (psl, ps2, ps4) and 
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younger phylostrata that show depletion of them (ps9-ps24). In an ideal case, better 
understanding of these or other factors could eventually guide experiments and help in the 


engineering of retroposition-resistant therapeutic mRNAs. 



















584 \ 
183 ii 
| | 
1- || 
| | @ 
er | 
** — M. musculus Te 
P = 
— H. sapiens ® 
a 
N 
Kej 
no] 
P o 7 
D 
2 
a 
io} 
nod 
we 
|] @ 
|| @ 














ps1 | ps2 | ps3 | ps4 ps5 | ps6 ps7 | ps8 ps9 |ps10 | ps11|ps12|ps13|ps14 | ps15 |ps16 | ps17 ps18 | ps19 | ps20 | ps21 | ps22 | ps23 | ps24 


Phylostrata 

M. musculus 

H. sapiens 
Rodentia \ 

rimates Y ps24 

Boroeutheria\ 523. ps2 

Placentalia > ps22 p 
M Py ates J ps21 
A seh ta I ps20 
mniota Y ps19 


Tetrapoda y s18 
Euteleostomi \ ps17 peig 
Gnathostomata 5576 
Vertebrata \ 
ps15 


Chordata 5574. 
Deuterostomia y -77 


Bilateria Ypst2 ps3 
Eumetazoa y——— 


pi 
Metazoa y——~ pat 

Choanoflagellida/Metazoa y— <5) ee 

A Holozoa Y ps8 
Opisthokonta Y ps7 
Amorphea ps6 
Eukaryota Ey pss 
Asg.arch./Euk azg 

Arch./Asg.arch./Euk mzz" i 


$ P 
Cell. org. Y ps2 
ps1 





Figure 3. The parental genes that generate retrocopies in human and mouse populations 
tend to be evolutionary ancient. The phylostratigraphic maps of human and mouse protein 
coding genes are generated using corresponding consensus phylogenies containing 24 
internodes (phylostrata - ps). To simplify presentation of the phylostratigraphic results human 
and mouse phylogenies are overlapped and shown in the lower panel. The two phylogenies 
differ only in the last two phylostrata (ps23, ps24); i.e. Rodentia-M. musculus vs. Primates-H. 
sapiens lineage. Protein sequences of all human (Ensemble GRCh38.86) and mouse genes 
(Ensembl GRCm38.86) are compared by BLAST against the corresponding custom reference 
database (e-value 0.001) and mapped on the respective phylogeny using the phylostratigraphic 
approach (140,142,145,148). The distribution of human (483, blue numbers, (37) and mouse 
parental genes (1659, red numbers, (40) are shown at the top of upper panel. The log-odds chart 
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in the upper panel shows deviation from the expected frequency of parental genes in humans 
(blue line) and mice (red line). Significance of these deviations is tested by the two-way 
hypergeometric test adjusted for multiple comparisons (*P < 0.05; **P < 0.01; ***P < 0.001). 
The gray shaded phylostrata (psl - cellular organisms, ps2 - Archaea/Asgard 
Archaea/Eukaryota and ps4 - Eukaryota) are enriched for parental genes. Starting with Metazoa 
(ps9), evolutionary more recent phylostrata show significant depletion in the number of 
parental genes. This phylostratigraphic pattern is effectively unchanged in the range of e-value 


cut-offs from 1 to 10, therefore it could be considered fairly robust (148). 


Pharmacology aspects 

Synthetic mRNAs have rather complex pharmacology that is dependent on their nucleotide 
sequence, formulation and administration route (10,52,149). The likelihood of synthetic 
mRNA genome integration via L1 elements, beside the nucleotide sequence, depends on its 
distribution in tissues and organs, and eventually on its concentration and stability in the cell 
cytosol. The quantity of synthetic mRNA in a single dose is the initial factor that determines 
the pharmacokinetics and pharmacodynamics of mRNA vaccines (10,149), hence it is helpful 
to consider declared values for BNT162b2 and mRNA-1273. In a single 30ug BNT162b2 dose 
(1,150) there are around 1.3 x 10! synthetic mRNA molecules. If we ignore the loss of vaccine 
mRNAs on the route to the cytosol, and assume their homogenous distribution among roughly 
3 x 10! nucleated cells in the human body (151), then every nucleated cell could receive about 
26 mRNA copies. This is substantial amount if compared to the expressed human protein 
coding genes that have on average 25 mRNA copies per cell (152). These values show that the 
quantity of vaccine mRNA delivered in a single dose of BNT162b2 is large enough to 
theoretically reprogram the transcriptome of every single human cell that in principle can 
undergo retroposition. The undisclosed sequence of mRNA-1273 vaccine prevents similar 
calculation, but under assumption that its sequence length and nucleotide composition is 
comparable to BNT162b2 (2,5,52), the number of mRNA molecules per nucleated cell are 
possibly even higher because a single dose of mRNA-1273 vaccine contains 100ug of synthetic 
mRNA (2,52). This calculation provides the theoretical upper bound of vaccine mRNA cellular 
uptake, however the lower bound is much more challenging to estimate due to the complex 


pharmacology of synthetic mRNAs (10) and rather limited data in the literature (1,2,52,150). 
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After intramuscular inoculation BNT162b2 and mRNA-1273 mRNA molecules should reach 
the cell cytosol where they are translated to SARS-CoV-2 spike proteins, which eventually 
elicit the protective immune response (1,2,52,149,153). On this road from the entry site to the 
cell cytosol some naked and unmodified mRNAs would be mostly degraded by the 
omnipresent extracellular ribonucleases (5,6,10,154). The remaining mRNAs that eventually 
enter the cell through endocytosis predominantly end up entrapped in endosomes and degrade 
over time (10,52,153,154). On top of this, naked mRNAs with unmodified nucleosides are 
detected in the endosome and cytosol by pattern recognition receptors, which by triggering the 
interferon signaling and other pathways promote RNA degradation, induce inflammation, and 
inhibit translation and replication (5,10,52). So even if some external mRNAs reach the cytosol 
their half-life should be largely compromised. These multiple innate immunity mechanisms 
against external RNAs show that eukaryotic cells are under strong selective pressure to avoid 
transcriptome reprograming. By preventing the entry and activity of external mRNAs in the 
cytosol, these protective mechanisms also largely preclude possible interaction of external 
mRNAs and endogenous L1 machinery, and consequently lower the chances that some 


exogenous mRNAs undergo retroposition. 


However, mRNA vaccines to be effective must overcome these innate defense mechanisms 
against exogenous RNAs, reach the cytosol, and have to be efficiently translated by ribosomes 
(6,10). In the case of BNT162b2 and mRNA-1273 vaccines this is achieved by elaborate 
sequence optimizations and nucleoside modifications that stabilize synthetic mRNAs and make 
them largely invisible to innate defense mechanisms (1,2,6,10,52). To further protect them 
from the harsh extracellular environments, they are formulated in lipid nanoparticles (LNPs) 
that facilitate their cellular uptake and cytosol entry by endosomal escape (1,2,10,52,149). It is 
important to note that these remarkable engineering achievements that improve vaccine mRNA 
cytosol delivery inadvertently increase the chances of vaccine mRNA retroposition (Fig. 1c). 
This shortcoming stems from the fact that, in principle, any improvement in the vaccine mRNA 
cytosol delivery increases probability of interaction with the endogenous L1 machinery. 
Nevertheless, regardless of the increased stability and LNP formulation of vaccine mRNAs, 
substantial fraction of the initial dose is degraded and will never reach the cytosol (149,153). 
Unfortunately, accessible information in the public domain on the BNT162b2 and mRNA- 
1273 does not reveal which percentage of the initial vaccine mRNA dose becomes bioavailable 
in the cytosol (1,2,149). In any case, any further improvement in the cytosol delivery of vaccine 


mRNAs, which is a heavily pursued goal in the mRNA _ vaccinology field 
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(6,10,149, 153,155,156), will concomitantly increase the chances of L1-mediated retroposition 
(Fig. Ic). 


Every mRNA molecule in the cytosol will eventually decay through one of many degradation 
pathways (157,158). In contrast to exogenous vaccine mRNAs that once degraded are not 
replaced (6,10,155), the levels of endogenous mRNAs are controlled by the interplay between 
transcription and decay (157,158). If all other parameters are ignored, this would mean that the 
probability of L1-mediated retroposition is higher for an endogenous gene with typical levels 
of expression than for a vaccine mRNA that is transiently present in the cell. However, several 
additional factors increase the chances of vaccine mRNA retroposition. The number of 
received doses per individual directly increases the chance of retroposition because it prolongs 
the time for the encounter of vaccine mRNA with L1 machinery. Currently, BNT162b2 and 
mRNA-1273 are administered intramuscularly as a series of two doses, three weeks and one 
month apart respectively (1,2,52). Any eventual increase in the number of required doses would 
further rise the chances of vaccine mRNA retroposition. This could be a particularly prominent 
problem if the mRNA vaccines would require long-term recurrent application — like in the 


case of the current seasonal vaccination program against influenza (159). 


Additional property that influences the likelihood of vaccine mRNA genome integration is the 
stability of vaccine mRNA molecules. The turnover of endogenous mRNA molecules in 
eukaryotic cells shows great variability, with estimated average half-life of around 7 hours 
(160). The precise measurements of the vaccine mRNA half-life in cells are not publicly 
available (1,2), but it is clear that the sequence and codon optimization of vaccine mRNAs 
increases their functional half-life with an aim to improve their translation efficiency 
(6,10,27,52,160,161). Undoubtedly, this prolonged functional half-life increases the chances 
that vaccine mRNAs encounter L1 machinery and eventually retropose into the genome. In 
addition, it remains unexplored how vaccine mRNAs interact with ribonucleoprotein granules 
that participate in the regulation of mRNA storage and decay (28,157,162,163) as well as with 


the cytoplasm residing L1 ribonucleoprotein particles (139). 


Biodistribution profiles 
A biodistribution profile is another important parameter that determines the likelihood of 
vaccine mRNA genome integration because the activity of L1 elements differs between the 


cells, tissues and organs (94,95,109). Interestingly, direct biodistribution studies have not been 
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conducted for the BNT162b2 vaccine (1). However, surrogate studies in mice and rats indicate 
distribution, in different quantities, from the injection site to most tissues, including liver, 
adrenal glands, spleen and gonads (1). Direct distribution and pharmacokinetic studies for the 
mRNA-1273 vaccine were also not conducted, but studies in rats using the same LNPs and a 
cocktail of mRNAs encoding cytomegalovirus antigens indicate that these mRNAs, with the 
exception of kidney, could be detected at varying levels in all examined tissues including the 
injection site muscle, proximal and distal lymph nodes, spleen, eyes, heart, lung, brain and 
testis (2). Notably, the distribution of mRNA to ovaries is not tested because no female rats 
were included in this study, as explained in the regulatory documents (2). Obviously, these 
surrogate biodistribution profiles substantially overlap with organs known to show the activity 
of L1 elements like liver (122), spleen (101), brain (56,94,112—116), adrenal glands (101), 
muscles (99,126,164) and gonads (91,100,101,104,107). 


If the quantity of vaccine mRNA in a single dose of BNT162b2 or mRNA-1273 is considered, 
these neither strictly localized nor fully systemic distribution patterns suggest that in some 
tissues vaccine mRNA likely accumulates in rather high concentrations, with potential to 
saturate the exogenous mRNA uptake capacity of recipient cells (10,165). To evaluate more 
precisely the probability of L1 mediated retroposition, it is important to understand which cell 
types can uptake vaccine mRNA. Dendritic cells and macrophages present at the inoculation 
site and draining nodes are, according to the regulatory body, the two principal cell types 
targeted by BNT162b2 and mRNA-1273 vaccines (166). However, the assessment report for 
the BNT162b2 vaccine states that is unknown whether other cells than professional antigen 
presenting cells (APCs) may transiently express the vaccine derived spike protein (1). 
Similarly, the mRNA-1273 vaccine assessment report declares that the delivered vaccine 
mRNA is mainly expressed by macrophages and dendritic cells (2). This apparently reveals 
that the mRNA-1273 is expressed in some other cell types as well. It is also indicative that the 
mechanisms of action that would drive BNT162b2 and mRNA-1273 exclusively/preferentially 


to dendritic cells and macrophages, if exists, is not explained in these documents (1,2,166). 


Although macrophages and dendritic cells, as professional antigen presenting cells (APCs), are 
specialized in sampling their environment, essentially all nucleated cells are endocytosis 
competent. The evidence from several studies indicates that the cellular uptake of the mRNA 
LNPs relies on the apolipoprotein E (ApoE) binding to LNPs and their subsequent endocytosis 
that is facilitated by low density lipoprotein (LDL) receptors (52,165,167,168). Since ApoE, 
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LDL and LDL-like receptors are expressed by many cell types throughout the body (169,170) 
it could be expected that APCs are not the only cell types that internalize mRNA LNPs 
(52,168). For example, some studies indicate that myocytes, epithelial cells and fibroblast 
uptake vaccine mRNA and contribute to its expression (52,171—173). These considerations 
suggest that cell types other than dendritic cells and macrophages most likely internalize 
BNT162b2 and mRNA-1273 vaccine mRNAs, and that the potential encounter of L1 
machinery and vaccine mRNAs may occur in diverse cell types within the broad range of 


tissues. 


Another level of complexity in the transport and uptake of LNP-formulated exogenous mRNA 
arises with the recent finding that, after endocytosis, LNPs containing mRNA are repackaged 
in late endosomes and secreted back into extracellular space as extracellular vesicles (EVs) 
(Fig. 1c) (53). These vaccine mRNA EVs (endo-EVs) protect exogenous mRNA in 
extracellular fluids during in vivo transport to other organs, and deliver intact exogenous 
mRNA to the cytoplasm of the distant recipient cells (53,54,174-176). Because of their small 
size vaccine mRNA EVs are less visible than LNPs to innate immunity mechanisms and can 
pass through the vascular endothelium and the extracellular matrix (53,177). Given that many 
cell types including dendritic cells (178) and macrophages (179) secrete EVs, the range of cells 
and tissues that exogenous mRNAs could reach is substantially broadened, if compared to the 
LNPs route only (Fig. 1c). A recent work shows that L1 mRNAs in cultured cells could also 
be packaged into EVs, delivered via EVs to recipient cells and retroposed into their genome 
(Fig. la) (51). Together, this suggests that the dynamics of EVs substantially raise the odds for 


the interaction between active L1 elements and vaccine mRNAs (Fig Ic). 


The possibility of vaccine mRNA genome integration in somatic and germline cells (Fig. 1) is 
not the only adverse effect that should be considered. Theoretically, the vaccine mRNA could 
also be epigenetically inherited via the sperm RNA cargo (180-183). This could happen if the 
testis cells of the male germinative lineage uptake LNPs or EVs containing vaccine mRNAs, 
and if these mRNAs then end up in spermatozoa (181,182,184). Alternatively, during their 
functional maturation in epididymis, spermatozoa could potentially actively internalize vaccine 


mRNAs delivered by epididymal EVs (183,184). 


Final remarks 
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There are some further points that should be mentioned. Several papers report that infection of 
human cells by viruses, including SARS-Cov-2, increases activity of their endogenous L1 
retroelements (185—188) — consistent with the presumed environmental modulation of L1 
activity (109). These findings suggest that, paradoxically, mRNA vaccination during active or 
after resolved viral infection might increase chances of vaccine mRNA genome integration. 
The COVID-19 vaccine mRNAs code for SARS-CoV-2 spike protein (52), so it is important 
to know if there is any evidence that SARS-CoV-2 mRNAs could integrate into the genome. 
Indeed, a recent study shows that upon infection SARS-CoV-2 subgenomic mRNAs can be 
reverse-transcribed by L1 elements and integrated into the genome of infected cells (185). 
Interestingly, fragments of mRNAs closer to the 3' end of the SARS-CoV-2 genome, including 
spike mRNA, are more frequently integrated into the cell DNA than the sequences closer to 
the 5' end (185). This integration bias could be related to the differences in the abundance of 
SARS-CoV-2 subgenomic mRNAs (189) as suggested by the authors (185). However, it could 
also reflect the nested architecture of subgenomic mRNAs (189) coupled with the mechanism 
of L1 retroposition that relies on the poly-A tail (49) and is prone to truncate transcripts with 


increasing distance from the 3' end. 


L1 retrotransposon activity is closely linked with replication (45,81,190,191), and is suggested 
that the retroposition of cellular mRNAs is coupled to cell divisions (37,60). This implies that 
the risk of vaccine mRNA genome integration might be increased in human proliferating cell 
populations. The biodistribution profiles of vaccine mRNA are not available for tumors, 
however increased replication activity coupled with elevated L1 retrotransposition in tumor 
cells (79) make them a favorable environment for possible vaccine mRNA genome integration. 
In this regard, it would be very informative to test biodistribution profile of mRNA vaccines in 


murine tumor models, and to look for eventual somatic retroposition events. 


At the first glance, it appears that the application of mRNA vaccines could not alter the primary 
retroposition rates at the individual and population level. The underlying reason is that vaccine 
mRNAs are not directly mutagenic and that their route to potential genome integration hinges 
on the endogenous cellular mechanisms; i.e. the activity of L1 elements that continuously 
operate on the available mRNA pool. Nevertheless, the possible change in primary 
retroposition rates should not be immediately dismissed because it cannot be excluded without 
testing that vaccination with LNPs-formulated mRNAs do not modulate L1 activity. As already 


explained, it is well established that many exogenous factors modify L1 activity (109), 
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including viral infections (185-188), so the impact of mRNA vaccination should also be 


evaluated in this regard. 


On the other hand, it is apparent that eventual vaccine mRNA genome integration broadens the 
spectrum of conceivable sequences that could be retrocopied (Fig. 1). Our cells evolved under 
mutational pressure that comes from the activity of L1 elements which generate retrocopies of 
our native genes (37,40). However, the transfection of human cells with exogenous and 
artificially modified mRNAs, which have potential to be retrocopied into the genome (Fig. Ic), 
extends the standard mutational sequence space to the realm of transgenic modifications. It is 
rather clear that any possibility of transgenesis in humans has ethical concerns that should be 


properly addressed. 


The retroposition of a vaccine mRNA molecule is in principle a random event that can occur 
in any transfected cell that shows the activity of L1 elements (Fig. 1c). The clonal expansion 
of a new retrocopy largely depends on its phenotypic effects and the pre-existing proliferative 
capacity of the mutated cell. On one extreme, a vaccine mRNA retrocopy that directly 
inactivates an essential gene (92,192) would result in cell death that would preclude any further 
spread of that retrocopy. However, a retrocopy that is moderately deleterious or neutral 
(141,193), and has emerged in a cell with high proliferative potential, has good odds to be 
propagated to the large number of descendant cells. In adults, the proliferative capacity of many 
cells in the soma is considerably limited (193,194), and it further drops with aging (195). This 
implies that the vaccine mRNA retrocopy mosaicism in the adult soma should be largely 
restricted to smaller cell clusters or individual cells. Nevertheless, a retroposition event in a 
progenitor cell, an adult stem cell (196) or a premalignant cell (193) would lead to clonal 


expansion of the retrocopy in much larger chunks of somatic tissue. 


In contrast to relatively confined effects of somatic retroposition, a possible heritable vaccine 
mRNA retroposition event would have a more far-reaching impact by rendering fully 
transgenic individuals. The hypothetical vaccine mRNA retrocopy with heritable potential 
could occur in germinative cells or in the pluripotent cells of early embryos (92). As already 
discussed above, the documents of regulatory agencies state that the surrogate biodistribution 
studies report distribution of LNP-formulated mRNA to gonads (1,2), which are known to 
display activity of L1 elements (91,94,100,101,104—107). On the other hand, vaccine mRNA 
stored in the sperm RNA cargo could hypothetically reach the pluripotent cells of early 
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embryos, which are the hot-spots of L1 activity (88—90,92, 102,103), and undergo retroposition 
there. This in turn could result in somatic mosaicism where the substantial part of cells in an 
individual could become transgenic, and if the gonads are also affected, the retrocopy could 


become heritable (92,108). 


The phenotype of a vaccine mRNA retrocopy will depend, among other factors, on the number 
and identity of cells that become transgenic, the insertion locus, completeness of the inserted 
sequence, direction of the insertion, peculiarities of the recipient genome and the expression 
potential of the retrocopy. Although native mRNAs lack transcription-driving elements it is 
well established that most of their retrocopies show evidence of transcription (38,40,41), hence 
it could be expected that a hypothetical vaccine mRNA retrocopy would also have good 
chances to be expressed. Many expressed retrocopies of native genes tend to have a strong 
negative impact on fitness and are therefore relatively quickly purged from the population (40). 
It was suggested that these deleterious effects of expressed retrocopies are often related to the 
interference with their parental genes (40). Since a hypothetical vaccine mRNA retrocopy does 
not have a parental gene in the host genome (Fig 1c), effects related to the expression 
interference between the retrocopy and its parental gene are not possible. However, an 
expressed retrocopy of vaccine mRNA could interact in unpredictable ways with the host 
immune system, later viral infections, and vaccine mRNAs received in subsequent 


administration rounds. 


Conclusions 

Current engineering strategies (136) and declared future directions (136,197) for the 
improvement of mRNA vaccines do not consider the possibility of vaccine mRNA genome 
integration via L1 retroelements native to human cells. This is at odds with the knowledge base 
on the biology of Ll-based retroposition and its role in the genetics, development, and 
evolution of humans. Why this risk is overlooked is even more obscure knowing that mRNA 
retroposition is a biomedically recognized phenomenon outside  vaccinology 
(42,47,58,61,62,64,65,72,74,75,78). To alleviate these discrepancies between the fields, it 
would be critical to design and perform experimental studies on animal models that aim to 
detect the existence of vaccine mRNA retrocopies and estimate their retroposition frequencies. 
As the retroposition propensity via L1 retroelements is sequence dependent, it would be 


advisable to independently test every mRNA therapeutic candidate. This information could 
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then guide further vaccine mRNA refinements in the direction of avoiding active L1 cellular 


environments (198), or by improving their resilience to the L1 machinery capture (97). 


Every technology is a double-edged sword, and mRNA therapeutics are not an exception. In 
this complex COVID-19 crisis it is essential that all pros-and-cons of control measures, 
procedures, treatments, prophylaxis and vaccine technologies are continually openly discussed 
and cautiously evaluated from many angles. An encouraging example in this direction are 
recently published papers that, in a balanced way, discuss the largely ignored negative aspects 
of COVID-19 pandemic control measures and practices on the overall human microbiome 
(199), neonatal microbiome (200) and immunity (201). I hope that the possible interplay 
between mRNA vaccines and L1 elements presented here will also provoke debate and attract 


the attention of researchers in a broad range of disciplines. 


Whether or not the current vaccine mRNAs could integrate into the genome, and by which 
frequency, has to be ultimately proven by experiments. However, it remains puzzling why and 
how the mRNA vaccinology field neglected the retroposition biology of L1 retroelements and 
its theoretical links to possible vaccine mRNA retroposition, if one considers the volume, 
visibility and significance of the L1 (42,43,56,78—80,99,112) and retroposition research (36— 
41,43,44,47,56,62,64,72,75). The mRNA vaccinology field started its development more than 
30 years ago (11,31) and L1 retroelements in humans are studied for more than 40 years 
(202,203) but obviously without any crosstalk between the two fields. This awkward silo effect 
points that in some occasions the structural drawbacks of contemporary science, despite its 
amassment, globalization and unprecedented dissemination, are deeper than we are willing to 
admit. I conclude that the broadly reiterated statement that mRNA-based therapeutics could 
not impact genomes is an unfounded assumption of unclear origin. This implies that the current 
mRNA vaccine evaluations, lacking studies that specifically address genome integration, are 
insufficient to declare their genome integration safety. In this regard, it is important that the 
exact nucleotide sequences of mRNA vaccines are easily publicly accessible, including product 
information documents (204,205), to allow unambiguous and independent tracking of possible 
vaccine mRNA integration in the somatic and germinative genomes of already vaccinated 


people and their progeny. 
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